现有的步态识别方法要么直接从原始步态序列建立全局特征表示(GFR),要么从几个本地部分生成本地特征表示(LFR)。但是,随着在更深层次的网络层中,GFR倾向于忽略人类姿势的局部细节。尽管LFR允许网络专注于每个局部区域的详细姿势信息,但它忽略了不同地方部分之间的关​​系,因此仅利用了几个特定区域的有限本地信息。为了解决这些问题,我们提出了一个名为GaitGL的基于全球的步态识别网络,以生成更具歧视性的特征表示。具体来说,开发了一个新颖的全球和局部卷积层(GLCL),以充分利用每一层中的全局视觉信息和局部区域细节。 GLCL是一种双支分支结构,由GFR提取器和基于掩模的LFR提取器组成。 GFR提取器旨在提取上下文信息,例如各个身体部位之间的关系,并提出了基于掩码的LFR提取器,以利用当地区域的详细姿势变化。此外,我们引入了一种基于面膜的新型策略,以提高局部特征提取能力。具体而言,我们设计了一对互补口罩以随机遮住特征图,然后在各种封闭的特征图上训练我们的基于面具的LFR提取器。通过这种方式,LFR提取器将学会完全利用本地信息。广泛的实验表明,步态比最先进的步态识别方法更好。 CASIA-B,OU-MVLP,增长和GAIT3D的平均排名准确性分别为93.6%,98.7%,68.0%和63.8%,明显优于竞争方法。拟议的方法在两场比赛中赢得了一等奖:HID 2020和HID 2021。
translated by 谷歌翻译
For low-level computer vision and image processing ML tasks, training on large datasets is critical for generalization. However, the standard practice of relying on real-world images primarily from the Internet comes with image quality, scalability, and privacy issues, especially in commercial contexts. To address this, we have developed a procedural synthetic data generation pipeline and dataset tailored to low-level vision tasks. Our Unreal engine-based synthetic data pipeline populates large scenes algorithmically with a combination of random 3D objects, materials, and geometric transformations. Then, we calibrate the camera noise profiles to synthesize the noisy images. From this pipeline, we generated a fully synthetic image denoising dataset (FSID) which consists of 175,000 noisy/clean image pairs. We then trained and validated a CNN-based denoising model, and demonstrated that the model trained on this synthetic data alone can achieve competitive denoising results when evaluated on real-world noisy images captured with smartphone cameras.
translated by 谷歌翻译
普通微分方程和神经网络的组合,即神经普通微分方程(神经ode),已从各个角度广泛研究。但是,在神经ode中解密的数值整合仍然是一个开放的挑战,因为许多研究表明,数值整合会显着影响模型的性能。在本文中,我们提出了反修改的微分方程(IMDE),以阐明数值整合对训练神经模型的影响。 IMDE取决于学习任务和受雇的ODE求解器。结果表明,训练神经模型实际上返回IMDE的紧密近似值,而不是真实的ode。在IMDE的帮助下,我们推断出(i)学习模型与真实颂歌之间的差异是由离散误差和学习损失的总和界定的; (ii)使用非透明数值整合的神经颂歌理论上无法学习保护定律。进行了几项实验以在数值上验证我们的理论分析。
translated by 谷歌翻译
我们提出了使用轨迹数据来学习未知无源动力学系统的音量扩展网络(VPNET)。我们提出了三个模块,并将它们组合在一起以获得两个网络体系结构,即创建的R-VPNET和LA-VPNET。所提出的模型的独特特征是它们是固有的卷积保护。另外,证明了相应的近似定理,从理论上讲,这些定理可以保证所提出的VPNET学习无源动力学的表现。数值实验证明了VP-NET的有效性,概括能力和结构保存特性。
translated by 谷歌翻译
我们开发了一个结构计量模型,以捕获人类评估人员在在线微贷款平台上的决策动态,并使用现实世界数据集估算模型参数。我们在人类评估人员的决策中发现了两种类型的性别,基于偏好的偏差和基于信念的偏差的偏见。两种类型的偏见都赞成女申请人。通过反事实模拟,我们量化性别偏见对贷款授予成果和公司福利的影响和借款人。我们的结果意味着,基于偏好的偏差的存在和基于信念的偏差的存在降低了公司的利润。当删除基于偏好的偏差时,该公司获得更多利润。当基于信仰的偏差被移除时,公司的利润也增加了。既增加借款人,尤其是男性借款人的批准概率,也会增加结果,最终偿还贷款。对于借款人,消除任何一个偏差都会降低信用风险评估中真正阳性率的性别差距。我们还从反事实模拟中培训了真实数据和数据的机器学习算法。我们比较这些算法所做的决定,以了解评估者的偏差是如何由算法继承的,并反映在基于机器的决策中。我们发现机器学习算法可以减轻基于偏好的偏差和基于信念的偏差。
translated by 谷歌翻译
跨域建议可以帮助缓解传统的连续推荐系统中的数据稀疏问题。在本文中,我们提出了Recguru算法框架,以在顺序推荐中生成包含跨域的用户信息的广义用户表示,即使在两个域中的最小或没有公共用户时也是如此。我们提出了一种自我细心的AutoEncoder来导出潜在用户表示,以及域鉴别器,其旨在预测所产生的潜在表示的原点域。我们提出了一种新的逆势学习方法来训练两个模块,以使从不同域生成的用户嵌入到每个用户的单个全局Gur。学习的Gur捕获了用户的整体偏好和特征,因此可以用于增强行为数据并改进在涉及用户的任何单个域中的推荐。在两个公共交叉域推荐数据集以及从现实世界应用程序收集的大型数据集进行了广泛的实验。结果表明,Recguru提高了性能,优于各种最先进的顺序推荐和跨域推荐方法。收集的数据将被释放以促进未来的研究。
translated by 谷歌翻译
双向反射率分配功能(BRDF)在计算机图形中使用普及,以产生逼真的基于物理的外观。近年来,利用神经网络探索的几项工作来代表BRDFS,利用神经网络的高压缩率及其适应高度复杂功能的能力。但是,一旦代表,BRDF将是固定的,因此缺乏参与后续行动的灵活性。在本文中,我们提出了一种“神经布奇代数”的形式,并同时关注BRDFS的代表和运营。我们提出了一种表示神经网络,将BRDFS压缩到潜在的矢量中,其能够准确地表示BRDFS。我们还提出了几种可以单独应用于潜伏空间的操作,例如分层和插值。通过使用潜伏向量的纹理来实现空间变化是简单的。此外,我们的代表可以有效地评估和采样,为更昂贵的蒙特卡罗分层方法提供竞争解决方案。
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities. Prior methods successfully preserve the character of clothing images, however, occlusion remains a pernicious effect for realistic virtual try-on. In this work, we first present a comprehensive analysis of the occlusions and categorize them into two aspects: i) Inherent-Occlusion: the ghost of the former cloth still exists in the try-on image; ii) Acquired-Occlusion: the target cloth warps to the unreasonable body part. Based on the in-depth analysis, we find that the occlusions can be simulated by a novel semantically-guided mixup module, which can generate semantic-specific occluded images that work together with the try-on images to facilitate training a de-occlusion try-on (DOC-VTON) framework. Specifically, DOC-VTON first conducts a sharpened semantic parsing on the try-on person. Aided by semantics guidance and pose prior, various complexities of texture are selectively blending with human parts in a copy-and-paste manner. Then, the Generative Module (GM) is utilized to take charge of synthesizing the final try-on image and learning to de-occlusion jointly. In comparison to the state-of-the-art methods, DOC-VTON achieves better perceptual quality by reducing occlusion effects.
translated by 谷歌翻译